Appl. No. 10/611,344 

Amdt. dated October 4, 2007 

Reply to Office action of April 4, 2007 

Amendments to the Claims: 

This listing of claims will replace all prior versions, and listings, of claims in the application: 
Listing of Claims: 

1. (Currently Amended) A method comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 bits, a first 

register storing a first operand having a set of L data elements and designating, with 3 

bits, a second register storing a second operand having a set of L control elements, 

wherein the first operand and second operand are of a same size and each of the L data 

elements and L control elements are of a same size, and wherein each one of the L 

control elements is divided into three portions, the first portion being a flush to zero bit 

occupying the most significant bit of each control element wherein the flush to zero bit 

alone controls whether a resultant element is flushed to zero, the second portion being a 

position selection field that is at least log 2 L bits wide and indicates a position of one of 

said L data elements, and a third portion reserved for another purpose , 

storing a resultant operand in said first register having L resultant data elements of the 

same size as the L data elements and the L control elements to shuffle data in said 

first register without a modification to the L control elements of the second operand 

stored in the second register , wherein the value of each resultant data element is 

controlled by the position selection field of the L control elements in the same 

position as the resultant data element, and is either, 

the one of the L data elements designated by the position selection field of 

said control element if said control element's flush to zero bit is not set; or 
a zero if said control element's flush to zero bit is set. 
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2. (Cancelled) 

3. (Cancelled) 

4. (Previously Presented) The method of claim 1 wherein said control element is to designate a 
first operand data element by a data element position number. 

5. (Cancelled) 

6. (Cancelled) 

7. (Previously Presented) The method of claim 1 further comprising outputting a resultant data 
block comprising data that was shuffled from said first operand in response to said control 
elements of said second operand. 

8. (Original) The method of claim 1 wherein each of said data elements comprises a byte of data. 

9. (Original) The method of claim 8 wherein each of said control elements is a byte wide. 

10. (Original) The method of claim 9 wherein L is 8 and wherein said first operand, said second 
operand, and said resultant are each comprised of 64-bit wide packed data. 

11. (Original) The method of claim 9 wherein L is 16 and wherein said first operand, said second 
operand, and said resultant are each comprised of 128-bit wide packed data. 

12. (Currently Amended) An apparatus comprising: an execution unit to execute a single packed 
shuffle instruction designating, with 
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3 bits, a first register storing a first operand comprised of a set of L data elements and 

designating, with 3 bits, a second register storing a second operand comprised of a set of 

L control elements, wherein the first operand and second operand are of a same size and 

each of the L data elements and L control elements are of a same size, and wherein each 

one of the L control elements is divided into three portions, the first portion being a flush 

to zero bit occupying the most significant bit of each control element wherein the flush to 

zero bit alone controls whether a resultant element is flushed to zero, the second portion 

being a position selection field that is at least log2L bits wide and indicates a position of 

one of said L data elements, and a third portion, said shuffle instruction to cause said 

execution unit to shuffle data in said first register without a modification to the L control 

elements of the second operand stored in the second register and to store a resultant 

operand in said first register having L resultant data elements of the same size as the L 

data elements and the L control elements, wherein the value of each resultant data 

element is controlled by the position selection field of the L control elements in the same 

position as the resultant data element, and is either, 

a zero if said control element's flush to zero bit is true, otherwise 

the one of the L data elements designated by the position selection field of 

said individual control element. 

13. (Original) The apparatus of claim 12 wherein each of said L control elements occupies a 
position in said second operand and is associated with a similarly located data element position 
in a resultant. 
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14. (Original) The apparatus of claim 13 wherein each individual control element is to designate 
a first operand data element by a data element position number. 

15. (Cancelled) 

16. (Cancelled) 

17. (Previously Presented) The apparatus of claim 12 wherein said shuffle instruction is to 
further cause said execution unit to generate a resultant having L data element positions that have 
been filled based on said set of L control elements. 

18. (Original) The apparatus of claim 12 wherein each of said data elements comprises a byte of 
data and each of said control elements is a byte wide. 

19. (Original) The apparatus of claim 18 wherein L is 8 wherein said first operand, said second 
operand, and said resultant are each comprised of 64-bit wide packed data. 

20. (Original) The apparatus of claim 18 wherein L is 16 and wherein said first operand, said 
second operand, and said resultant are each comprised of 128-bit wide packed data. 

21. (Currently Amended) An article of manufacture comprising a machine readable storage 
medium that stores data, that when accessed by a machine, causes the machine to perform 
operations comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 bits, a first 
register storing a first operand having a set of L data elements and designating, with 3 
bits, a second register storing a second operand having a set of L control elements, 
wherein the first operand and second operand are of a same size and each of the L data 
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elements and L control elements are of a same size, and wherein each one of the L 

control elements is divided into three portions, the first portion being a flush to zero bit 

occupying the most significant bit of each control element wherein the flush to zero bit 

alone controls whether a resultant element is flushed to zero, the second portion being a 

position selection field that is at least log2L bits wide and indicates a position of one of 

said L data elements, and a third portion, storing a resultant operand in said first register 

having L resultant data elements of the same size as the L data elements and the L control 

elements to shuffle data in said first register without a modification to the L control 

elements of the second operand stored in the second register , wherein the value of each 

resultant data element is controlled by the position selection field of the L control 

elements in the same position as the resultant data element, and is either, 

the one of the L data elements designated by the position selection field of 

said control element if said control element's flush to zero bit is not set; or 

a zero if. said control element's flush to zero bit is set. 

22. (Previously Presented) The article of manufacture of claim 21 wherein said data stored by 
said machine readable storage medium represents an integrated circuit design, which when 
fabricated performs said predetermined function in response to a single instruction. 

23. (Previously Presented) The article of manufacture of claim 22 wherein said machine readable 
storage medium further includes data, that causes the machine to perform operations further 
comprising: 

generating a resultant having L data element positions that been filled in 
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accordance to said set of L control elements. 

24. (Previously Presented) The article of manufacture comprising the machine readable storage 
medium of claim 23 wherein each of said L control elements is associated with a similarly 
located data element position in a resultant. 

25. (Previously Presented) The article of manufacture comprising the machine readable storage 
medium of clainl24 wherein each individual control element is to designate a first operand data 
element by a data element position number. 

26. (Previously Presented) The article of manufacture comprising the machine readable storage 
medium of claim 25 wherein each of said data elements comprises a byte of data. 

27. (Cancelled) 

28. (Cancelled) 

29. (Previously Presented) The article of manufacture of -claim 21 wherein said data stored by 
said machine readable storage medium represents a computer instruction, which, if executed by a 
machine, causes said machine to perform said predetermined function. 

30. (Currently Amended) A method comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 

bits, a first register storing a first operand having a set of L data elements and 
designating, with 3 bits, a second register storing a second operand having a set of L 
masks, wherein the first operand and second operand are of a same size and each of 
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the L data elements and L masks are of a same size, and wherein each one of the L 

masks is divided into three portions, the first portion being a flush to zero bit 

occupying the most significant bit of each control element wherein the flush to zero 

bit alone controls whether a resultant element is flushed to zero, the second portion 

being a position selection field that is at least log 2 L bits wide and indicates a position 

of one of said L data elements, and a third portion, and wherein each of said L masks 

occupies a particular position in said second operand and is associated with a 

similarly located data element position in a resultant operand, storing the resultant 

operand in said first register having L resultant data elements of the same size as the 

L data elements and the L masks to shuffle data in said first register without a 

modification to the L masks of the second operand stored in the second register , 

wherein the value of each resultant data element is controlled by the position 

selection field of the L masks in the same position as the resultant data element, and 

is either, 

a zero if said mask's flush to zero bit is set; or 

if said mask's flush to zero bit is not set, the one of the L data elements designated by 
the position selection field of said mask to said associated resultant data element 
position. 

31.-33. (Cancelled) 

34. (Previously Presented) The method of claim 30 wherein said first operand, said second 
operand, and said resultant are each comprised of 64-bit wide packed data. 
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35. (Previously Presented) The method of claim 30 wherein said first operand, said second 
operand, and said resultant are each comprised of 128-bit wide packed data. 

36. (Currently Amended) A method comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 bits, a first 
register storing a first operand having a set of L data elements and designating, with 3 
bits, a second register storing a second operand having a set of L shuffle masks, wherein 
the first operand and second operand are of a same size and each of the L data elements 
and L masks are of a same size, and wherein each one of the L shuffle masks is divided 
into three portions, the first portion being a flush to zero bit occupying the most 
significant bit of each control element wherein the flush to zero bit alone controls 
whether a resultant element is flushed to zero, the second portion being a position 
selection field that is at least log2L bits wide and indicates a position of one of said L 
data elements, and a third portion, and wherein each of said L shuffle masks is associated 
with a similarly located data element position in a resultant operand, storing the resultant 
operand in said first register having L resultant data elements of the same size as the L 
data elements and the L masks to shuffle data in said first register without a modification 
to the L control elements of the second operand stored in the second register , wherein the 
value of each resultant data element is controlled by the position selection field of the L 
individual masks in the same position as the resultant data element, and is either, 

a zero if said mask's flush to zero bit is set, otherwise 

the one of the L data elements designated by the position selection field of said 
individual shuffle mask to said associated resultant data element position. 
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37. (Cancelled) 

38. (Cancelled) 

39. (Currently Amended) An apparatus comprising: 

a first memory location to store a plurality of source data elements; 

a second memory location to store a plurality of control elements, each of said control 
elements to correspond to a resultant data element position, and wherein each one of 
said control elements is divided into three portions, the first portion being a flush to 
zero bit occupying the most significant bit of each control element wherein the flush 
to zero bit alone controls whether a resultant element is flushed to zero, the second 
portion being a position selection field that is at least log 2 L bits wide and indicates a 
position of one of said L data elements, and a third portion; 

control logic coupled to said first memory location and said second memory location, 
said control logic in response the receipt of a single packed shuffle instruction 
designating, with three bits, a first memory location storing a first operand having a 
set of L data elements and designating a second memory location storing a second 
operand having a set of L control elements, wherein the first operand and the second 
operand are of a same size and each of the L data elements and L control elements are 
of a same size, to generate a plurality of selection signals and a plurality of flush to 
zero signals, a zero signal generated when a control element's flush to zero bit is set; 

a first plurality of multiplexers coupled to said first memory location and said 

plurality of selection signals, each of said first plurality of multiplexers to store a 
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resultant operand in said first memory location having L resultant data elements 

of the same size as the L data elements and the L control elements , shuffled in 

said first memory location without a modification to the L control elements , 

wherein the value of each resultant data element is controlled by the position 

selection signal of the L control elements in the same position as the resultant data 

element, and is the one of the L data elements for a specific resultant data element 

position in response to a selection signal corresponding to said specific resultant 

data element position; and 

a second plurality of multiplexers coupled to said first plurality of multiplexers and to 
said plurality of flush to zero signals, each of said second plurality of multiplexers 
associated with a specific resultant data element position, each of said second 
plurality of multiplexers to output a zero if its flush to zero signal is active or to 
output a data element shuffled for that specific resultant data element position. 

40. (Original) The apparatus of claim 39 wherein said plurality of source data elements is a first 
packed data operand. 

41. (Original) The apparatus of claim 40 where said plurality of control elements is a second 
packed data operand. 

42. (Original) The apparatus of claim 40 wherein said first and second memory locations are a 
single instruction multiple data registers. 

43. (Original) The apparatus of claim 42 wherein: 
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said first packed operand is 64 bits long and each of said source data elements is a byte 

wide; and 

said second packed operand is 64 bits long and each of said control elements is a byte 
wide. 

44. (Original) The apparatus of claim 42 wherein: 

said first packed operand is 128 bits long and each of said source data elements is a byte 
wide; and 

said second packed operand is 128 bits long and each of said control elements is a byte 
wide. 

45. (Currently Amended) An apparatus comprising: 

control logic to receive a single packed shuffle instruction designating, with three bits, a first 
memory location storing a first operand having a set of M data elements and designating, 
with three bits, a second memory location storing a second operand having a set of L 
shuffle masks, wherein each of the M data elements and L shuffle masks are of a same 
size, and wherein each one of the L shuffle masks is divided into three portions, the first 
portion being a flush to zero bit occupying the most significant bit of each shuffle mask 
wherein the flush to zero bit alone controls whether a resultant element is flushed to zero, 
the second portion being a position selection field that is at least log 2 L bits wide, and a 
third portion, and wherein each shuffle mask is associated with a unique resultant data 
element position controlled by the position selection field of said shuffle mask, said 
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control logic to provide a select signal and a flush to zero signal for each resultant data 

element position; 

a set of L multiplexers coupled to said control logic, wherein each multiplexer is also 
associated with a unique resultant data element position, each multiplexer to output to 
said first memory location data elements shuffled in said memory location without a 
modification to the L control elements, that is either, 

a zero if said shuffle mask's flush to zero signal is active or the one of the M data 
elements designated by the select signal of said shuffle mask if said shuffle mask's 
flush to zero signal is not inactive. 

46. (Original) The apparatus of claim 45 further comprising a register with L unique data 
element positions, each data element position to hold an output from its associated multiplexer. 

47. (Original) The apparatus of claim 46 wherein L is 16 and M is 16. 

48. (Currently Amended) A system comprising: 
a memory to store data and instructions; 

a processor coupled to said memory on a bus, said processor operable to perform 

a shuffle operation, said processor comprising: 

a bus unit to receive a single packed shuffle instruction, from said memory, said 

instruction to designate, with 3 bits, a first register storing L data elements from a first 
operand, and to designate, with three bits, L shuffle control elements from a second 
operand, wherein the first operand and second operand are of a same size and each of 
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thee L data elements and L control elements are of a same size, and wherein each one 

o the L control elements is divided into three portions, the first portion being a flush 

to zero bit occupying the most significant bit of each control element wherein the 

flush to zero bit alone controls whether a resultant element is flushed to zero, the 

second portion being a position selection field that is at least log 2 L bits wide and 

indicates a position of one of the L data elements, and a third portion; 

an execution unit coupled to said bus unit, said execution unit to execute said single 
packed shuffle instruction, said single packed shuffle instruction to cause said 
execution unit to: 

store a resultant operand in said first register having L resultant data elements of the 
same size as the L data elements and the L control elements , wherein said 
resultant is shuffled in said first register without a modification to the L control 
elements of the second operand , wherein the value of each resultant data element 
is controlled by the position selection field of the L control elements in the same 
position as the resultant data element, and is either, 

the one of the L data elements designated by the position selection field of said 
control element if said control element's flush to zero bit is not set; or 

a zero if said control element's flush to zero bit is set. 
49. -51. (Cancelled) 

52. (Original) The system of claim 48 wherein each data element is a byte wide, each shuffle 
command element is a byte wide, and L is 8. 



Docket No: 42390P15762 



Page 14 of 22 



WLJ/crd 



Appl. No. 10/611,344 

Amdt. dated October 4, 2007 

Reply to Office action of April 4, 2007 

53. (Original) The system of claim 48 wherein said first operand is 64 bits long and said second 
operand is 64 bits long. 
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